Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

RNA-Seq Data Analysis ◾ 203

my.contrasts

<- makeContrasts(conditiontumo-conditionnorm,levels=design)

fitq<-glmQLFit(yNorm, design)

qlfq<-glmTreat(fitq,contrast=my.contrasts, lfc=2)

go <- goana(qlfq, species=”Hs”)

topGO20<-topGO(go, sort=”up”, n=20)

write.csv(topGO20,file=”topGO20.csv”)

Figure 5.28 shows GO IDs, terms, ontology (ONT), the total number of genes annotated

with each ontology term (N), the number of genes that are significantly upregulated (up)

and downregulated (down), the gene counts of the top significant GO ordered by the sig-

nificance of the p-value, and the p-values for the upregulated (P.Up) and p-values for the

downregulated (P.Down). Since the p-values are not adjusted for multiple testing, it is rec-

ommended to ignore GO terms with p-values greater than about 10⁻⁵.

Since this exercise is based on a single chromosome (chromosome 22), we do not expect

much information as when we analyze the entire genome. In general, GO analysis tells

us about the different biological processes, their localizations in the cells, and molecular

functions based on the upregulated and downregulated genes.

In the same way, we can perform KEGG pathway analysis to identify the molecular

pathways and disease signatures (Figure 5.29).

my.contrasts

<- makeContrasts(conditiontumo-conditionnorm,levels=design)

fitq<-glmQLFit(yNorm, design)

FIGURE 5.29 KEGG annotation of the significantly expressed genes.